Method and apparatus for rendering acoustic signal, and computer-readable recording medium

ABSTRACT

In cases of rendering a multichannel signal such as a 22.2 channel signal as a 5.1 channel signal, a three dimensional (3D) audio signal may be reproduced using a two dimensional (2D) output channel, but rendered audio signals are sensitively affected by a layout of speakers and may cause distortion of a sound image when the layout of arranged speakers is different from a standard layout. The present invention may solve the aforementioned problem of the prior art. The audio signal rendering method for reducing distortion of a sound image even when the layout of the arranged speakers is different from the standard layout, according to one embodiment of the present invention, includes: receiving a multi-channel signal including a plurality of input channels that are to be converted to a plurality of output channels; obtaining deviation information about at least one output channel, from a location of a speaker and a standard location corresponding to each of the plurality of output channels; and modifying a panning gain from a height channel included in the plurality of input channels to the output channel having the deviation information, based on obtained deviation information.

TECHNICAL FIELD

The inventive concept relates to a method and apparatus for renderingaudio signal, and more particularly, to a rendering method and apparatusfor reproducing location of a sound image and tone color moreaccurately, by modifying a panning gain or a filter coefficient whenthere is a misalignment between a standard layout and an arrangementlayout of output channels.

BACKGROUND ART

Stereophonic sound denotes a sound, to which spatial information isadded, capable of reproducing a direction or a distance of a sound, aswell as pitch and tone color of a sound, allowing a listener to have animmersive feeling, and making a listener, who does not exist in a spacewhere a sound source has occurred, experience directional, distance, andspatial perceptions.

When a channel signal such as a 22.2 channel is rendered as a 5.1channel, a three-dimensional (3D) stereophonic sound may be reproducedusing a two-dimensional (2D) output channel, but rendered audio signalsare so sensitive to a layout of speakers that a sound image distortionmay occur if an arrangement layout of speakers is different from astandard layout.

DETAILED DESCRIPTION OF THE INVENTIVE CONCEPT Technical Problem

As described above, when a channel signal such as a 22.2 channel isrendered as a 5.1 channel, a three-dimensional (3D) stereophonic soundmay be reproduced using a two-dimensional (2D) output channel, butrendered audio signals are so sensitive to a layout of speakers that asound image distortion may occur if an arrangement layout of speakers isdifferent from a standard layout.

To address problems of the prior art, the inventive concept providesreduction in a sound image distortion even when a layout of installedspeakers is different from a standard layout.

Technical Solution

In order to achieve the objective, the present invention includesembodiments below.

An audio signal rendering method includes: receiving a multi-channelsignal comprising a plurality of input channels that are to be convertedto a plurality of output channels; obtaining deviation information aboutat least one output channel, from a location of a speaker correspondingto each of the plurality of output channels and a standard location; andmodifying a panning gain from a height channel included in the pluralityof input channels to the output channel having the deviationinformation, based on obtained deviation information.

Advantageous Effects

According to the inventive concept, an audio signal may be rendered soas to reduce sound image distortion even if a layout of installedspeakers is different from a standard layout or a location of a soundimage has changed.

DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating an internal structure of astereophonic sound reproduction apparatus according to an embodiment;

FIG. 2 is a block diagram of a renderer in the stereophonic soundreproduction apparatus according to the embodiment;

FIG. 3 is a diagram of a layout of channels in a case where a pluralityof input channels are down-mixed to a plurality of output channels,according to an embodiment;

FIG. 4 is a diagram of a panning unit in a case where a positionaldeviation occurs between a standard layout and an arrangement layout ofoutput channels, according to an embodiment;

FIG. 5 is a diagram illustrating configuration of a panning unit in acase where there is an elevation deviation between a standard layout andan arrangement layout of output channels, according to an embodiment;

FIG. 6 is diagrams showing locations of a sound image according to anarrangement layout of output channels, when a center channel signal isrendered from a left channel signal and a right channel signal;

FIG. 7 is diagrams showing localization of a location of a sound imageby correcting an elevation effect according to an embodiment, if thereis an elevation deviation in output channels;

FIG. 8 is a flowchart illustrating a method of rendering a stereophonicaudio signal, according to an embodiment;

FIG. 9 is a diagram showing an elevation deviation versus a panning gainwith respect to each channel when a center channel signal is renderedfrom a left channel signal and a right channel signal, according to anembodiment;

FIG. 10 is a diagram showing spectrums of tones at locations, accordingto a positional deviation between speakers;

FIG. 11 is a flowchart illustrating a method of rendering a stereophonicaudio signal according to an embodiment;

FIG. 12 is diagrams for illustrating methods of designing sound qualitycorrection filters, according to an embodiment;

FIG. 13 is diagrams showing examples in which an elevation deviationexists between output channels for 3D virtual rendering and a virtualsound source;

FIG. 14 is a diagram for illustrating a method of virtual rendering aTFC channel by using L/R/LS/RS channels according to an embodiment; and

FIG. 15 is a block diagram of a renderer for processing a deviation in avirtual rendering by using 5.1 output channels, according to anembodiment.

BEST MODE

In order to achieve the objective, the present invention includesembodiments below.

According to an embodiment, there is provided an audio signal renderingmethod including: receiving a multi-channel signal comprising aplurality of input channels that are to be converted to a plurality ofoutput channels; obtaining deviation information about at least oneoutput channel, from a location of a speaker corresponding to each ofthe plurality of output channels and a standard location; and modifyinga panning gain from a height channel included in the plurality of inputchannels to the output channel having the deviation information, basedon obtained deviation information.

The plurality of output channels may be horizontal channels.

The output channel having the deviation information may include at leastone of a left horizontal channel and a right horizontal channel.

The deviation information may include at least one of an azimuthdeviation and an elevation deviation.

The modifying of the panning gain may modify an effect caused by anelevation deviation, when the obtained deviation information includesthe elevation deviation.

The modifying of the panning gain may correct the panning gain by atwo-dimensional (2D) panning method, when the obtained deviationinformation does not include the elevation deviation.

The correcting of the effect caused by the elevation deviation mayinclude correcting an inter-aural level difference (ILD) resulting fromthe elevation deviation.

The correcting of the effect caused by the elevation deviation mayinclude modifying the panning gain of the output channel correspondingto obtained elevation deviation, in proportional to the obtainedelevation deviation.

A sum of square values of panning gains with respect to the lefthorizontal channel and the right horizontal channel may be 1.

According to an embodiment, there is provided an apparatus for renderingan audio signal, the apparatus including: a receiver configured toreceive a multi-channel signal including a plurality of input channelsthat are to be converted to a plurality of output channels; an obtainingunit configured to obtain deviation information about at least oneoutput channel, from a location of speaker corresponding to each of theplurality of output channels and a standard location; and a panning gainmodifier configured to modify a panning gain from a height channelcomprised in the plurality of input channels to the output channelhaving the deviation information, based on obtained deviationinformation.

The plurality of output channels may be horizontal channels.

The output channel having the deviation information may include at leastone of a left horizontal channel and a right horizontal channel.

The deviation information may include at least one of an azimuthdeviation and an elevation deviation.

The panning gain modifier may correct an effect caused by an elevationdeviation, when the obtained deviation information includes theelevation deviation.

The panning gain modifier may modify the panning gain by atwo-dimensional (2D) panning method, when the obtained deviationinformation does not include the elevation deviation.

The panning gain modifier may correct an inter-aural level differencecaused by the elevation deviation to correct an effect caused by theelevation deviation.

The panning gain modifier may modify a panning gain of an output channelcorresponding to the elevation deviation, in proportional to obtainedelevation deviation, so as to correct an effect caused by the obtainedelevation deviation.

A sum of square values of panning gains with respect to the lefthorizontal channel and the right horizontal channel may be 1.

According to an embodiment, there is provided a computer-readablerecording medium having recorded thereon a computer program forexecuting the above method.

In addition, there are provided another method, another system, and acomputer-readable recording medium having recorded thereon a computerprogram for executing the method.

Mode of the Inventive Concept

The detailed descriptions of the invention are referred to with theattached drawings illustrating particular embodiments of the invention.These embodiments are provided so that this disclosure will be thoroughand complete, and will fully convey the concept of the invention to oneof ordinary skill in the art. It will be understood that variousembodiments of the invention are different from each other and are notexclusive with respect to each other.

For example, a particular shape, a particular structure, and aparticular feature described in the specification may be changed from anembodiment to another embodiment without departing from the spirit andscope of the invention. Also, it will be understood that a position orlayout of each element in each embodiment may be changed withoutdeparting from the spirit and scope of the invention. Therefore, thedetailed descriptions should be considered in a descriptive sense onlyand not for purposes of limitation and the scope of the invention isdefined not by the detailed description of the invention but by theappended claims, and all differences within the scope will be construedas being included in the present invention.

Like reference numerals in the drawings denote like or similar elementsthroughout the specification. In the following description and theattached drawings, well-known functions or constructions are notdescribed in detail since they would obscure the present invention withunnecessary detail. Also, like reference numerals in the drawings denotelike or similar elements throughout the specification.

Hereinafter, the present invention will be described in detail byexplaining exemplary embodiments of the invention with reference to theattached drawings. The invention may, however, be embodied in manydifferent forms, and should not be construed as being limited to theembodiments set forth herein; rather, these embodiments are provided sothat this disclosure will be thorough and complete, and will fullyconvey the concept of the invention to those of ordinary skill in theart.

Throughout the specification, when an element is referred to as being“connected to” or “coupled with” another element, it can be “directlyconnected to or coupled with” the other element, or it can be“electrically connected to or coupled with” the other element by havingan intervening element interposed therebetween. Also, when a part“includes” or “comprises” an element, unless there is a particulardescription contrary thereto, the part can further include otherelements, not excluding the other elements.

Hereinafter, the inventive concept will be described in detail belowwith reference to accompanying drawings.

FIG. 1 is a block diagram illustrating an internal structure of astereophonic sound reproducing apparatus according to an embodiment.

The stereophonic sound reproducing apparatus 100 according to anembodiment may output a multi-channel audio signal, in which a pluralityof input channels are mixed to a plurality of output channels toreproduce. Here, when the number of output channels is less than thenumber of input channels, the input channels are down-mixed according tothe number of output channels.

Stereophonic sound denotes sound, to which spatial information is added,allowing a listener to have an immersive feeling by reproducing adirection or feeling of distance of a sound, as well as an elevation andtimbre of the sound, so that even a listener who does not exist in aspace where a sound source has occurred may experience directional,distance, and spatial perceptions.

In the descriptions below, an output channel of an audio signal maydenote the number of speakers that output sound. The more the outputchannels, the more the number of speakers from which the sound isoutput. The stereophonic sound reproducing apparatus 100 according tothe embodiment may render and mix a multi-channel audio input signal tooutput channels that will reproduce the sound, so that the multi-channelaudio signal from a large number of input channels may be output andreproduced in an environment where a less number of output channels areprovided. Here, the multi-channel audio signal may include a channelcapable of outputting an elevated sound.

The channel capable of outputting the elevated sound may denote achannel capable of outputting an audio signal via a speaker locatedabove a head of a listener so that the listener may experience elevatedfeeling. A horizontal channel may denote a channel capable of outputtingan audio signal via a speaker located on a horizontal plane with respectto the listener.

The above-described environment in which less number of output channelsare provided may denote an environment in which the sound may be outputvia a speaker provided on a horizontal plane, without using an outputchannel capable of outputting the elevated sound.

In addition, in the descriptions below, a horizontal channel may denotea channel including an audio signal that may be output via a speakerprovided on the horizontal plane. An overhead channel may denote achannel including an audio signal that may be output via a speaker thatis provided on an elevated position, not on the horizontal plane, inorder to output the elevated sound.

Referring to FIG. 1, the stereophonic sound reproducing apparatus 100may include an audio core 110, a renderer 120, a mixer 130, and apost-processor 140.

The stereophonic sound reproducing apparatus 100 according to theembodiment may render, mix, and output a multi-channel input audiosignal to an output channel to reproduce. For example, the multi-channelinput audio signal may be a 22.2 channel signal, and the output channelto reproduce may be 5.1 or 7.1 channels. The stereophonic soundreproducing apparatus 100 performs a rendering by designating outputchannels to which channels of the multi-channel input audio signal willcorrespond, and performs mixing of the rendered audio signals by mixingsignals of the channels respectively corresponding to the channels toreproduce and outputs a final signal.

An encoded audio signal is input to the audio core 110 in a format of abistream, and the audio core 110 decodes the input audio signal afterselecting a decoder tool suitable for the encoded format of the audiosignal.

The renderer 120 may render the multi-channel input audio signal to amulti-channel output channels according to channels and frequencies. Therenderer 120 may perform three-dimensional (3D) rendering andtwo-dimensional (2D) rendering on the multi-channel audio signalaccording to overhead channels and horizontal channels. A configurationof the renderer and a detailed rendering method will be described inmore detail later with reference to FIG. 2.

The mixer 130 may mix the signals of the channels corresponding to thehorizontal channels by the renderer 120, and output the final signal.The mixer 130 may mix the signals of the respective channels accordingto each of predetermined sections. For example, the mixer 130 may mixthe signals of the respective channels by one frame unit.

The mixer 130 according to the embodiment may perform the mixing basedon power values of the signals that are rendered to the respectivechannels to produce. That is, the mixer 130 may determine amplitude ofthe final signal or a gain to be applied to the final signal based onthe power values of the signals rendered to the respective channels toreproduce.

The post-processor 140 performs a controlling of a dynamic range withrespect to a multi-band signal and binaurlaizing on an output signal ofthe mixer 130 to be suitable for the respective reproducing apparatus(speaker, headphones, etc.). An output audio signal output from thepost-processor 140 is output via a device such as a speaker, and theoutput audio signal may be reproduced in a 2D or 3D manner according tothe process performed by each element.

The stereophonic sound reproducing apparatus 100 illustrated withreference to FIG. 1 according to the embodiment is shown based on aconfiguration of an audio decoder, and other additional configurationsare omitted.

FIG. 2 is a block diagram illustrating configuration of the rendereramong the configuration of the stereophonic sound reproducing apparatusaccording to an embodiment.

The renderer 120 includes a filtering unit 121 and a panning unit 123.

The filtering unit 121 compensates for a tone or the like of a decodedaudio signal according to a location, and may perform filtering of aninput audio signal by using a head-related transfer function (HRTF)filter.

The filtering unit 121 may render an overhead channel that has passedthrough the HRTF filter in different manners according to a frequencythereof, in order to perform 3D rendering on the overhead channel.

The HRTF filter may allow a stereophonic sound to be recognizedaccording to a phenomenon in which a characteristic of a complicatedpath such as diffraction on a surface of a head, reflection by auricles,etc. is changed depending on a transfer direction of a sound, as well asa simple difference between paths such as an inter-aural leveldifference (ILD) and an inter-aural time difference (ITD) which occurswhen a sound reaches two ears, etc. The HRTF filter may process theaudio signals included in the overhead channel, that is, by changingsound quality of the audio signal so that the stereophonic sound may berecognized.

The panning unit 123 calculates and applies a panning coefficient thatis to be applied to each frequency band and each channel, in order topan the input audio signal with respect to each output channel. Panningof the audio signal denotes controlling a magnitude of a signal appliedto each output channel, in order to render a sound source at a certainlocation between two output channels.

The panning unit 123 may render a low frequency signal among theoverhead channel signals according to add-to-the-closest channel method,and may render a high frequency signal according to a multichannelpanning method. According to the multichannel panning method, a gainvalue that is set to differ in channels to be rendered to each ofchannel signals is applied to signals of each of channels of amultichannel audio signal, so that each of the signals may be renderedto at least one horizontal channel. The signals of each channel to whichthe gain value is applied may be synthesized via mixing and may beoutput as a final signal.

Since the low frequency signal has a high diffractive property, even ifeach channel in the multi-channel audio signal is rendered only to onechannel, without being rendered to various channels according to themulti-channel panning method, the listener may feel the sound qualitysimilarly to each other. Therefore, the stereophonic sound reproducingapparatus 100 according to the embodiment may render the low frequencysignal according to the add-to-the-closest channel method, and thus,sound quality degradation that may occur when various channels are mixedto one output channel may be prevented. That is, if various channels aremixed to one output channel, sound quality may be amplified or decreaseddue to interference between the channel signals and thus may degrade,and thus, the sound quality degradation may be prevented by mixing onechannel to one output channel.

According to the add-to-the-closest channel method, each channel of themulti-channel audio signal may be rendered to a closest channel fromamong the channels to reproduce, instead of being rendered to variouschannels.

Also, the stereophonic sound reproducing apparatus 100 performs therendering operation differently from the frequency, thereby increasing asweet spot without degrading the sound quality. That is, the lowfrequency signal having a high diffractive property is renderedaccording to the add-to-the-closest channel method in order to preventthe sound quality degradation that may occur when various channels aremixed to one output channel. The sweet spot denotes a predeterminedrange in which the listener may optimally listen to the stereophonicsound that has not been distorted.

As the sweet spot is increased, the listener may optimally listen to thestereophonic sound that has not been distorted within a large range. Inaddition, if the listener does not exist within the sweet spot, thelistener may listen to the sound, the sound quality or the sound imageof which has been distorted.

FIG. 3 is a diagram of a layout of channels in a case where a pluralityof input channels are down-mixed to a plurality of output channels,according to an embodiment.

A technology for providing a stereophonic sound with a stereoscopicimage has been being developed in order to provide a user with realismand immersive feeling that are equal to or more exaggerated thanreality. A stereophonic sound denotes that an audio signal itself has anelevation of sound and spatiality, and in order to reproduce thestereophonic sound, at least two or more loud speakers, that is, outputchannels, are necessary. Also, a large number of output channels arenecessary in order to accurately reproduce feelings of elevation,distance, and spatiality of the sound, except for a binauralstereophonic sound using an HRTF.

Therefore, various multi-channel systems such as a 5.1-channel system,the Auro 3D system, the Holman 10.2 channel system, the ETRI/Samsung10.2 channel system, the NHK 22.2 channel system, etc., in addition to astereo system having two output channels, have been suggested anddeveloped.

FIG. 3 is a diagram illustrating an example in which a stereophonicaudio signal of 22.2 channels is reproduced by a 5.1-channel outputsystem.

A 5.1-channel system is a generalized name of a 5-channel surroundmulti-channel sound system, and has been widely distributed and used ashome-theater in households and a sound system for theatres. All kinds of5.1 channels include a front left (FL) channel, a center (C) channel, afront right (FR) channel, a surround left (SL) channel, and a surroundright (SR) channel. As denoted in FIG. 3, since the output channels ofthe 5.1-channel system are placed on a same horizontal plane, the5.1-channel system physically corresponds to 2D system. In order for the5.1-channel system to reproduce stereophonic audio signals, a renderingprocess for granting 3D effect to a signal to be reproduced has to beperformed.

The 5.1-channel system is widely used in various fields such as digitalversatile disc (DVD) video, DVD sound, super audio compact disc (SACD),or digital broadcasting, as well as in movies. However, although the5.1-channel system provides an improved spatiality when comparing withthe stereo system, there are many restrictions in forming widerlistening space. In particular, the 5.1-channel system forms a narrowsweet spot and may not provide a vertical sound image having anelevation angle, and thus, the 5.1-channel system may not be suitablefor a wide listening space, e.g., a theater.

A 22.2-channel system suggested by NHK includes three-layers of outputchannels. An upper layer includes a Voice of God (VOG), T0, T180, TL45,TL90, TL135, TR45, TR90, and TR45 channels. Here, in the name of eachchannel, an index T denotes an upper layer, indexes L and R respectivelydenote left and right, and a number at the rear denotes an azimuth anglefrom a center channel.

A middle layer is on a same plane as the 5.1 channels, and includesML60, ML90, ML135, MR60, MR90, and MR135 channels in addition to outputchannels of the 5.1 channels. Here, in the name of each channel, anindex M at the front means a middle layer, and a number at the reardenotes an azimuth angle from a center channel.

A low layer includes L0, LL45, and LR45 channels. Here, an index L atthe front of the name of each channel denotes a low layer, and a numberat the rear denotes an azimuth angle from a center channel.

In the 22.2 channels, the middle layer is referred to as a horizontalchannel, and the VOG, T0, T180, T180, M180, L, and C channels havingazimuth angle of 0° or 180° are referred to as vertical channels.

When a 22.2 channel input signal is reproduced via the 5.1 channelsystem, the most general scheme is to distribute signals to channels byusing a down-mix formula. Otherwise, an audio signal having an elevationmay be reproduced through the 5.1-channel system by performing renderingto provide a virtual elevation.

FIG. 4 illustrates a panning unit according to an embodiment in a casewhere a positional deviation occurs between a standard layout and anarrangement layout of output channels.

When a multichannel input audio signal is reproduced by using a smallernumber of output channels than the number of channels of an inputsignal, an original sound field may be distorted, and in order tocompensate for the distortion, various techniques are being researched.

General rendering techniques are supposed to perform rendering based ona case where speakers, that is, output channels, are arranged accordingto the standard layout. However, when the output channels are notarranged to accurately match the standard layout, distortion of alocation of a sound image and distortion of a tone occur.

The distortion of the sound image widely includes distortion of theelevation and distortion of a phase angle that are not sensitively feltin a relatively low level. However, due to a physical characteristic ofa human body where both ears are located in left and right sides, ifsound images of left-center-right sides are changed, the distortion ofthe sound image may be sensitively perceived. In particular, a soundimage of a front side may be further sensitively perceived.

Therefore, as shown in FIG. 3, when the 22.2 channels are realized byusing the 5.1 channels, it is particularly required not to change soundimages of the VOG, T0, T180, T180, M180, L, and C channels located at 0°or 180°, rather than left and right channels.

When an audio input signal is panned, two processes are basicallyperformed. The first process corresponds to an initializing process inwhich a panning gain with respect to an input multichannel signal iscalculated according to a standard layout of output channels. In thesecond process, a calculated panning gain is modified based on a layoutwith which the output channels are actually arranged. After the panninggain modifying process is performed, a sound image of an output signalmay be present at a more accurate location.

Therefore, in order for the panning unit 123 to perform processing,information about the standard layout of the output channels andinformation about the arrangement layout of the output channels arerequired, in addition to the audio input signal. In a case where the Cchannel is rendered from the L channel and the R channel, the audioinput signal indicates an input signal to be reproduced via the Cchannel, and an audio output signal indicates modified panning signalsoutput from the L channel and the R channel according to the arrangementlayout.

FIG. 5 is a diagram of a configuration of a panning unit according to anembodiment in a case where there is an elevation deviation between astandard layout and an arrangement layout of the output channels.

The 2D panning method that only takes into account the azimuth deviationas shown in FIG. 4 may not correct an effect caused by an elevationdeviation if there is an elevation deviation between the standard layoutand the arrangement layout of the output channels. Therefore, if thereis an elevation deviation between the standard layout and thearrangement layout of the output channels, an elevation rising effectdue to the elevation deviation has to be compensated for by an elevationeffect compensator 124 as shown in FIG. 5.

In FIG. 5, the elevation effect compensator 124 and the panning unit 123are shown as separate elements, but the elevation effect compensator 124may be implemented as an element included in the panning unit 123.

Hereinafter, FIGS. 6 to 9 illustrate a method of determining a panningcoefficient according to a layout of speakers in detail.

FIG. 6 is diagrams showing a location of a sound image according to anarrangement layout of output channels, in a case where a center channelsignal is rendered from a left channel signal and a right channelsignal.

In FIG. 6, it is assumed that a C channel is rendered from the L channeland the R channel.

In FIG. 6A, the L channel and the R channel are located at a same planewhile having azimuth angles of 30° to left and right sides from the Cchannel according to the standard layout. In this case, a C channelsignal is rendered only by a gain obtained through an initialization ofthe panning unit 123 and is located at a regular position, and thus,there is no need to additionally modify the panning gain.

In FIG. 6B, the L channel and the R channel are located on a same planelike in FIG. 6A, and a location of the R channel matches the standardlayout, whereas the L channel has the azimuth angle of 45° that isgreater than 30°. That is, the L channel has an azimuth deviation of 15°with respect to the standard layout.

In the above case, the panning gain calculated through theinitialization process is the same with respect to the L channel and theR channel, and when the panning gain is applied, a location of the soundimage is determined to be C′ that is biased toward the R channel. Theabove phenomenon occurs because the ILD varies depending on a change inthe azimuth angle. When the azimuth angle is defined as 0° based on thelocation of the C channel, a level difference ILD of the audio signalsreaching two ears of a listener increases as the azimuth angleincreases.

Therefore, the azimuth deviation has to be compensated for by modifyingthe panning gain according to the 2D panning method. In a case shown inFIG. 5B, a signal of the R channel is increased or a signal of the Lchannel is reduced so that the sound image may be formed at the locationof the C channel.

FIG. 7 is diagrams showing localization of the sound image bycompensating for the elevation effect according to an embodiment, whenthere is an elevation deviation between the output channels.

FIG. 7A shows a case in which the R channel is arranged on a location ofR′ having an elevation angle so as to have an azimuth angle of 30° thatsatisfies the standard layout, whereas the R channel is not located onthe same plane as the L channel and has an elevation angle of 30° fromthe horizontal channel. In the above case, if the same panning gas isapplied to the R channel and the L channel, location of the sound imageC′ that has been changed due to the change of the ILD according to therising of the elevation of the R channel is not located at the centerbetween the L channel and the R channel, but is biased toward the Lchannel.

This is because the ILD is changed due to the elevation rising like inthe case where there is the azimuth deviation exists. If the elevationangle is defined to be 0° based on the horizontal channel, the leveldifference ILD of the audio signals reaching two ears of the listener isreduced as the elevation angle increases. Therefore, C′ is biased towardthe L channel that is the horizontal channel (having no elevationangle).

Therefore, the elevation effect compensator 124 compensates for the ILDof the sound having the elevation angle in order to prevent bias of thesound image. In more detail, the elevation effect compensator modifiesthe panning gain of the channel having the elevation angle to beincreased so as to prevent the bias of the sound image and to form thesound image at the azimuth angle 0°.

FIG. 7B shows a location of the sound image that is localized throughthe compensation of the elevation effect. The sound image beforecompensation of the elevation effect is located at C′, that is, a biasedposition toward the channel having no elevation angle as shown in FIG.7A. However, when the elevation effect is compensated for, the soundimage may be localized so as to be positioned at the center between theL channel and an R′ channel.

FIG. 8 is a flowchart illustrating a method of rendering a stereophonicaudio signal, according to an embodiment.

The method of rendering the stereophonic audio signal illustrated withreference to FIGS. 6 and 7 is performed in following order.

The renderer 120, in particular, the panning unit 123, receives amulti-channel input signal having a plurality of channels (810). Forpanning the received multi-channel input signal through multi-channeloutput, the panning unit 123 obtains deviation information about each ofoutput channels by comparing locations where the speakers correspondingto the output channels are arranged with standard output locations(820).

Here, if the output channel includes 5.1 channels, the output channelsare horizontal channels located on the same plane.

Deviation information may include at least one of information about anazimuth deviation and information about an elevation deviation. Theinformation about the azimuth deviation may include the azimuth angleformed by a center channel and output channels on the horizontal planewhere the horizontal channels exist, and information about the elevationdeviation may include an elevation angle formed by the horizontal planeon which the horizontal channels exist and the output channel.

The panning unit 123 obtains a panning gain that is to be applied to theinput multi-channel signal, based on the standard output location (830).Here, an order of the obtaining of the deviation information (820) andthe obtaining of the panning gain (830) may be switched.

In operation 820, as a result of obtaining the deviation informationabout each output channel, if the deviation information exists in theoutput channel, the panning gain obtained in operation 830 has to bemodified. In operation 840, it is determined whether there is anelevation deviation based on the deviation information obtained inoperation 820.

If the elevation deviation does not exist, the panning gain is modifiedonly by taking into account the azimuth deviation (850).

There may be various methods of calculating and modifying the panninggain. Representatively, a vector base amplitude panning (VBAP) methodbased on an amplitude panning or a tangent law may be used. Otherwise,in order to address the problem that the sweet spot has a narrow range,a method based on a wave field synthesis (WFS) that may providerelatively wide sweet spot by matching time delays of multi-speakersused in a reproduction environment in order to generate a waveformsimilar to a plane wave on a horizontal plane may be used.

Otherwise, when a transient signal such as raining sound, clappingsound, or the like and signals from various channels are down-mixed toone channel, the number of transient signals increases in one channeland a tone distortion such as whitening may occur. To address the aboveproblem, a hybrid virtual rendering method that performs the renderingprocess after selecting a 2D (timbral)/3D (spatial) rendering modesaccording to an importance of a spatial perception and sound quality ineach scene may be applied.

Otherwise, a rendering method that combines a virtual rendering forproviding spatial perception and a technique using an active down-mixthat improves sound quality by preventing comb-filtering during adown-mix process may be used.

If there is the elevation variation, the panning gain is modified whiletaking into account the elevation deviation (860).

Here, the modifying of the panning gain taking into account theelevation deviation includes a process of compensating for the risingeffect according to the increase in the elevation angle, that is,modifies the panning gain so as to compensate for the ILD that isreduced according to the elevation increasing.

After modifying the panning gain based on the deviation informationabout the output channel, the panning process of the correspondingchannel is finished. In addition, processes from operation 820, that is,obtaining the deviation information about each output channel, tooperation 850 or 860, that is, modifying the panning gain that is to beapplied to the corresponding channel, may be repeatedly performed asmany as the number of output channels.

FIG. 9 is a diagram showing an elevation deviation versus a panning gainwith respect to each channel, when a center channel signal is renderedfrom a left channel signal and a right channel signal, according to anembodiment.

FIG. 9 shows relation between the panning gains that are to be appliedto a channel having the elevation angle (elevated) and a channel on ahorizontal plane (fixed) and the elevation angle, as an embodiment ofthe elevation effect compensator 124.

When the C channel is rendered from the L channel and the R channel onthe horizontal plane, panning gains g_(L) and g_(R) that will be appliedto the L and R channels are equal to each other since the L channel andthe R channel arranged on the horizontal plane are symmetric with eachother, and each has a value of 0.707, that is, g_(L)=g_(R)=1/√{squareroot over (2)}. However, if one of the channels has the elevation angleas shown in the example of FIG. 7, the panning gain has to be modifiedaccording to the elevation angle in order to compensate for the effectcaused by the elevation increase.

In FIG. 9, the panning gain is modified to increase by a ratio of 8dB/90° according to the change in the elevation angle. With respect tothe examples shown in FIG. 7, a gain of an elevated channelcorresponding to the elevation angle 30° is applied to the R channel,and then, g_(R) is modified to 0.81, that is, increased from 0.707, anda gain of a fixed channel is applied to the L channel, and then, g_(L)is modified to 0.58, decreased from 0.707.

Here, the panning gains g_(L) and g_(R) have to satisfy Equation 2 belowfor energy normalization.

g _(L) ² +g _(R) ²=1   (2)

According to the embodiment illustrated with reference to FIG. 9, thepanning gain is modified to increase linearly by the ratio of 8 dB/90°according to the change in the elevation angle. However, the increasingratio may vary depending on the example of the elevation effectcompensator, or the panning gain may increase non-linearly.

FIG. 10 is a diagram showing spectrums of tone colors at differentlocations, according to a positional deviation between the speakers.

The panning unit 123 and the elevation effect compensator 124 processthe audio signals so that the sound image may not be biased according tolocations of the speakers corresponding to the output channels, but tobe located at an original location. However, if the locations of thespeakers corresponding to the output channels actually change, the soundimage is not only changed, but the tone color is also changed.

Here, a spectrum of the tone color that a human being perceivesaccording to the location of the sound image may be obtained based on anHRTF that is a function for transferring the sound image at a certainspatial location to human ears. The HRTF may be obtained by performingFourier transformation on a head-related impulse response (HRIR)obtained from a time domain.

Since an audio signal from a spatial audio source propagates through theair and passes through an auricle, an external auditory canal, and aneardrum, a magnitude or a phase of the audio signal have changed. Inaddition, since a listener is also located in a sound field, the audiosignal that is transferred is also changed due to a head, a torso, orthe like of the listener. Therefore, the listener finally listens to adistorted audio signal. Here, a transfer function of the audio signalthat the listener listens to, in particular, between an acousticpressure and the audio signal, is referred to as HRTF.

Since each person has a unique size and shape of head, auricle, andtorso, the HRTF is unique to each person. However, since it isimpossible to measure the HRTF from each person, the HRTF may bemodelled by using a common HRTF, a customized HRTF, etc.

A diffraction effect of a head is shown from about 600 Hz and is rarelyshown after 4 kHz, and a torso effect that may be observed from 1 kHz to2 kHz is increased as an audio source is located at ipsilateral azimuthand an elevation angle of the audio source is low, and is observed to 13kHz at which the auricle dominantly affects sound image of the audiosignal. Around a frequency of 5 kHz, a peak is shown due to resonance ofthe auricle. In addition, a first notch due to the auricle is shownwithin a range of 6 kHz to 10 kHz, a second notch due to the auricle isshown within a range of 10 kHz to 15 kHz, and a third notch due to theauricle is shown in a range of 15 kHz or greater.

In order to perceive the azimuth angle and the elevation angle, an ITDand an ILD of the audio source and peaks and notches shown in monauralspectral cues are used. The peaks and notches are generated due to thediffraction and dispersion of the torso, head, and auricle, and may beidentified in the HRTF.

As described above, the HRTF varies depending on the azimuth angle andthe elevation angle of the audio source. FIG. 10 shows a graph of thespectrum of tone color that a human being perceives according to afrequency of the audio source, in a case where the azimuth angle of thespeaker is 30°, 60°, and 110°.

When comparing the tone colors of the audio signals according to theazimuth angles, the tone color of the azimuth angle of 30° has moreintense component at 400 Hz or less by about 3 dB to about 5 dB, thanthat of the tone color of the azimuth angle of 60°. In addition, thetone color of the azimuth angle of 110° has less intense componentwithin a range of 2 kHz to 5 kHz by about 3 dB, than that of the tonecolor of the azimuth angle of 60°.

Therefore, when the tone color conversion filtering is performed byusing the characteristic of the tone color according to the azimuthangle, tone colors of a wideband signal provided to a listener may besimilar to each other, and thus, the rendering may be performed moreeffectively.

FIG. 11 is a flowchart illustrating a method of rendering a stereophonicaudio signal, according to an embodiment.

FIG. 11 is a flowchart illustrating an embodiment of the method ofrendering the stereophonic audio signal, that is, a method of performinga tone color conversion filtering on an input channel when the inputchannel is panned to at least two output channels.

A multi-channel audio signal that is to be converted to a plurality ofoutput channels is input to the filtering unit 121 (1110). When apredetermined input channel from the input multi-channel audio signal ispanned to at least two output channels, the filtering unit 121 obtains amapping relation between the predetermined input channel and the outputchannels to which the input channel is to be panned (1130).

The filtering unit 121 obtains a tone color filter coefficient based onan HRTF about a location of the input channel and locations of theoutput channels for panning based on the mapping relation, and performsa tone color correction filtering by using the tone color filtercoefficient (1150).

Here, the tone color correction filter may be designed by followingprocesses.

FIG. 12 is diagrams illustrating a method of designing a tone colorcorrection filter, according to an embodiment.

It is assumed that the HRTF transferred to a listener when an azimuthangle of the audio source is ° (degree) is defined as II_(e), and anaudio source having an azimuth angle of θ_(s) is panned (localized) tospeakers located at azimuth angles of θ_(D1) and θ_(D1). In this case,the HRTF with respect to the azimuth angles are respectively H_(θ) _(S), H_(θ) _(D1) , and H_(θ) _(Ds) .

Purpose of the tone color correction is to correct the sound reproducedfrom the speakers located at the azimuth angles of θ_(D1) and θ_(D1) tohave similar tone color to that of the sound at the azimuth angle θ_(S),and thus, an output signal from the azimuth angle θ_(D1) passes througha filter having a transfer function such as

$\frac{H_{\theta_{S}}}{H_{\theta_{D\; 2}}},$

and an output signal from the azimuth angle θ_(D2) passes through afilter having a transfer function such as

$\frac{H_{\theta_{S}}}{H_{\theta_{D\; 2}}}.$

As a result of the above filtering, the sound reproduced from thespeakers located at the azimuth angles θ_(D1) and θ_(D2) may becorrected to have similar tone colors to that of the sound from theazimuth angle of θ_(S).

In the example of FIG. 10, when the tone colors of the audio signalsfrom the azimuth angles are compared with one another, the tone color atthe azimuth angle of 30° has more intense component at 400 Hz or less byabout 3 dB to about 5 dB, than that of the azimuth angle of 60°, and thetone color at the azimuth angle of 110° has a smaller component within arange of 2 kHz to 5 kHz by about 4 dB than that of the azimuth angle of60°.

Since the purpose of the tone color correction is to correct the soundreproduced from the speakers located at the angles of 30° and 110° tohave similar tone color to that of the sound reproduced at the angle of60°, the component at 400 Hz or less in the sound reproduced from thespeaker at the angle of 30° is reduced by 4 dB in order to make the tonecolor to be similar to that of the sound at the angle of 60°, and thecomponent within the range of 2 kHz to 5 kHz in the sound reproducedfrom the speaker located at the angle of 110° is increased by 4 dB inorder to make the tone color to be similar to that of the sound at theangle of 60°.

FIG. 12A shows a tone color correction filter that is to be applied toan audio signal from the azimuth angle of 60° to be reproduced throughthe speaker at the azimuth angle of 30°, wherein the sound qualitycorrection filter is applied to an entire frequency section, that is, aratio

$\frac{H_{60}}{H_{30}}$

between the spectrum (HRTF) of the tone color when the azimuth angle is60° and the spectrum (HRTF) of the tone color when the azimuth angle of30° shown in FIG. 10.

In FIG. 12A,

$\frac{H_{60}}{H_{30}}$

becomes a filter that reduces a magnitude of a signal by 4 dB at afrequency of 500 Hz or less, increases the magnitude of the signal by 5dB at a frequency between 500 Hz to 1.5 kHz, and by-passes the signal ofthe other frequency domain, similarly to the above description.

FIG. 12B shows a sound quality correction filter that is to be appliedto an audio signal from the azimuth angle 60° to be reproduced throughthe speaker at the azimuth angle of 110°, wherein the sound qualitycorrection filter is applied to the entire frequency section, that is, aratio

$\frac{H_{60}}{H_{110}}$

between the spectrum (HRTF) of the tone color when the azimuth angle is60° and the spectrum (HRTF) of the tone color when the azimuth angle is110° shown in FIG. 10.

In FIG. 12B,

$\frac{H_{60}}{H_{110}}$

becomes a filter that increases the magnitude of the signal at thefrequency of 2 kHz to 7 kHz by 4 dB and by-passes the signal of theother frequency domain, similarly to the above description.

FIG. 13 is diagrams showing cases where there is an elevation deviationbetween an output channel and a virtual audio source in a 3D virtualrendering.

A virtual rendering is a technique for reproducing 3D sound from a 2Doutput system such as the 5.1-channel system, that is, a renderingtechnique for forming an sound image at a virtual location where thereis no speaker, in particular, at a location having an elevation angle.

Virtual rendering techniques that provide an elevation perception byusing 2D output channels basically include two operations, that is, anHRTF correction filtering and a multi-channel panning coefficientdistribution. The HRTF correction filtering denotes a tone colorcorrection operation for providing a user with the elevation perception,that is, performs similar functions as those of the tone colorcorrection filtering described above with reference to FIGS. 10 to 12.

Here, as shown in FIG. 13A, it is assumed that the output channels arearranged on a horizontal plane, and an elevation angle φ of a virtualaudio source is 35°. In this case, an elevation difference between an Lchannel, that is, a reproducing output channel, and the virtual audiosource is 35, and the HRTF with respect to the virtual audio source maybe defined as H_(E(35)).

On the contrary, as shown in FIG. 13B, it is assumed that the outputchannel has a greater elevation angle. In this case, although anelevation difference between the L channel, that is, the reproducingoutput channel, and the virtual audio source is 35, the output channelhas a greater elevation angle, the HRTF with respect to the virtualaudio source may be defined as H_(E(−35)).

Here, a relationship expressed by an equation

$H_{E{({- 35})}} = \frac{1}{H_{E{(35)}}}$

may be obtained. In addition, if there is no elevation differencebetween the virtual audio source and the output channel, the tone colorcorrection by using the elevation correction filter H_(E(φ)) is notperformed.

The above rendering operation may be generalized as shown in Table 1below.

TABLE 1 Elevation angle Whether to Elevation angle of reproduction usetone color of virtual speaker (output conversion Filter type (filteraudio source channel) filter coefficient) 0° 0° Not used 0° φ° Used$\frac{1}{H_{E{(\phi)}}}$ φ° 0° Used H_(E)(_(φ)) φ° φ° Not used

Here, a case where the tone color conversion filter is not used is thesame as a case where a by-pass filtering is performed. Table 1 above maybe applied to a case when the elevation difference is within apredetermined range from φ, as well as a case when the elevationdifference is accurately φ or −φ.

FIG. 14 is a diagram illustrating a virtual rendering of a TFC channelby using L/R/LS/RS channels, according to an embodiment.

The TFC channel is located at an azimuth angle of 0° and an elevationangle of 35°, and locations of horizontal channels L, R, LS, and RS forvirtually rendering the TFC channel are as shown in FIG. 14 and Table 2below.

TABLE 2 Speaker Azimuth angle Elevation angle (output channel) (azimuth)(elevation) L −45° 35°  R  30° 0° LS −110°  0° RS 135° 0°

As shown in FIG. 14 and Table 2 above, the R channel and the LS channelare arranged according to the standard layout, the RS channel has anazimuth deviation of 25°, and the L channel has an elevation deviationof 35° and an azimuth deviation of 15°.

The method of applying the virtual rendering to the TFC channel by usingthe L/R/LS/RS channels according to an embodiment is performed infollowing order.

Firstly, a panning coefficient is calculated. The panning gain may becalculated by loading initial values for virtual rendering of the TFCchannel, wherein the initial values are stored in a storage, or by usinga 2D rendering, a VBAP, etc.

Secondly, the panning coefficient is modified (corrected) according tothe arrangement of channels. When the layout of the output channels isas shown in FIG. 14, the L channel has the elevation deviation, apanning gain that is modified by the elevation effect compensator 124 isapplied to the L channel and the R channel for performing a pair-wisepanning using the L-R channels. On the other hand, since the RS channelhas the azimuth deviation, a panning coefficient that is modified by ageneral method is applied to the LS channel and the RS channel forperforming the pair-wise panning using the LS-RS channels.

Thirdly, the tone color is corrected by the tone color conversionfilter. Since the R channel and the LS channel are arranged according tothe standard layout, a filter H_(E) that is the same as that of theoriginal virtual rendering is applied thereto.

Since the RS channel only has the azimuth deviation and no elevationdeviation, the filter H_(E) that is the same as that of the originalvirtual rendering operation is used, but a filter H_(M110)/H_(M135) forcorrecting the component shifted from 110° that is the azimuth angle ofthe RS channel according to the standard layout to the azimuth angle135°. Here, is an HRTF with respect to the audio source at the angle of110° and H_(M135) is an HRTF with respect to the audio source at theangle of 135°. However, in this case, since the azimuth angles 110° and135° are relatively close to each other, the TFC channel signal renderedto RS output channel may be by-passed.

The L channel has both the azimuth deviation and the elevation deviationfrom the standard layout, and thus, the filter H_(E) that is to beapplied originally for performing the virtual rendering, a filterH_(T000)/H_(T045) for compensating for the tone color of the TFC channeland the tone color at the location of the L channel is applied. Here,H_(T000) is an HRTF with respect to the standard layout of the TFCchannel, and H_(T345) is an HRTF with respect to the location where theL channel is arranged. Otherwise, in the above case, since the locationof the TFC channel and the location of the L channel are relativelyclose to each other, it may be determine to by-pass the TFC channelsignal rendered to L output channel.

The rendering unit generates an output signal by filtering the inputsignal and multiplying the input signal by the panning gain, and thepanning unit and the filtering unit operate independently from eachother. This will be cleared with reference to a block diagram of FIG.15.

FIG. 15 is a block diagram of a renderer that processes a deviation in avirtual rendering by using 5.1 output channels, according to anembodiment.

The block diagram of the renderer shown in FIG. 15 illustrates an outputand a process of each block, when the L/R/LS/RS output channels that arearranged according to the layout of FIG. 14 are used to perform thevirtual rendering of the TFC channel by using the L/S/LS/RS channelslike in the embodiment illustrated with reference to FIG. 14.

The panning unit firstly calculates a virtual rendering panning gain inthe 5.1 channels. In the embodiment shown in FIG. 14, the panning gainmay be determined by loading initial values that are set to perform thevirtual rendering of the TFC channel by using the L/R/LS/RS channels.Here, the panning gains determined to be applied to the L/R/LS/RSchannels are g_(L0), g_(R0), g_(L50), and g_(R50).

In a next block, the panning gains between the L-R channels and theLS-RS channels are modified based on the deviation between the standardlayout of the output channels and the arrangement layout of the outputchannels.

In a case of the LS-RS channels, since the LS channel only has theazimuth deviation, the panning gains may be modified by a generalmethod. Modified panning gains are g_(LS) and g_(RS). In a case of theL-R channels, since the R channel has the elevation deviation, thepanning gains are modified by the elevation effect compensator 124 forcorrecting the elevation effect. Modified panning gains are g_(L) andg_(R).

The filtering unit 121 receives an input signal X_(TFC), and performsthe filtering operation with respect to each channel. Since the Rchannel and the LS channel are arranged according to the standardlayout, the filter H_(E) that is the same as that of the originalvirtual rendering operation is applied thereto. Here, outputs from thefilter are X_(TFC,R) and X_(TFC,LS).

Since the RS channel has no elevation deviation and only has the azimuthdeviation, the filter H_(g) that is the same as that of the originalvirtual rendering is used, and a correction filter H_(M110)/H_(M135) isapplied to a component that is shifted from the azimuth angle 110° ofthe LS channel according to the standard layout to the angle 135°. Here,an output signal from the filter is X_(TFC,RS).

The L channel has both the azimuth deviation and the elevation deviationwith respect to the standard layout, and thus, the filter H_(E) that isoriginally applied for performing the virtual rendering is not applied,but a filter H_(T000)/H_(T045) is applied for correcting a tone color ofthe TFC channel and a tone color at the location of the L channel. Here,an output signal from the filter is X_(TFC,L).

The output signals from the filters applied respectively to thechannels, that is, X_(TFC,L), X_(TFC,R), X_(TFC,LS), and X_(TFC,RS) aremultiplied by the panning gains g_(L), g_(R), g_(LS), and g_(RS) thatare modified by the panning unit to output signals y_(TPC,L), y_(TPC,R),y_(TPC,LS), and y_(TFC,RS) from the renderer with respect to the channelsignals.

The embodiments according to the present invention can also be embodiedas programmed commands to be executed in various computer configurationelements, and then can be recorded to a computer readable recordingmedium. The computer readable recording medium may include one or moreof the programmed commands, data files, data structures, or the like.The programmed commands recorded to the computer readable recordingmedium may be particularly designed or configured for the invention ormay be well known to one of ordinary skill in the art of computersoftware fields. Examples of the computer readable recording mediuminclude magnetic media including hard disks, magnetic tapes, and floppydisks, optical media including CD-ROMs, and DVDs, magneto-optical mediaincluding floptical disks, and a hardware apparatus designed to storeand execute the programmed commands in read-only memory (ROM),random-access memory (RAM), flash memories, and the like. Examples ofthe programmed commands include not only machine codes generated by acompiler but also include great codes to be executed in a computer byusing an interpreter. The hardware apparatus can be configured tofunction as one or more software modules so as to perform operations forthe invention, or vice versa.

While the detailed description has been particularly described withreference to non-obvious features of the present invention, it will beunderstood by one of ordinary skill in the art that various deletions,substitutions, and changes in form and details of the aforementionedapparatus and method may be made therein without departing from thespirit and scope of the following claims.

Therefore, the scope of the present invention is defined not by thedetailed description but by the appended claims, and all differenceswithin the scope will be construed as being included in the presentinvention.

1-19. (canceled)
 20. A method of rendering an audio signal, the methodcomprising: receiving multi-channel signals including one or more heightinput channels, to be converted from input channels configurations tooutput channel configurations; obtaining a panning gain for a heightinput channel to be converted into an output channel based on a standardloudspeaker position; obtaining deviation information about the outputchannel, from an output loudspeaker position and the standardloudspeaker position; and modifying the obtained panning gain based onthe obtained deviation information and the standard loudspeakerposition.
 21. The method of claim 20, wherein the deviation informationincludes at least one of an elevation deviation and an azimuthdeviation, and wherein the panning gain is modified to keep a centralimage corresponding to an azimuth of the height input channel.
 22. Themethod of claim 20, wherein plurality of output channels included in theoutput channel configurations are horizontal channels.
 23. The method ofclaim 20, wherein the output channel comprises at least one of a lefthorizontal channel and a right horizontal channel.
 24. The method ofclaim 21, wherein the modifying of the panning gain compensates aneffect caused by an elevation deviation, when the obtained deviationinformation comprises the elevation deviation.
 25. The method of claim21, wherein the modifying of the panning gain compensates the panninggain by a two-dimensional (2D) panning method, when the obtaineddeviation information does not comprise the elevation deviation.
 26. Themethod of claim 24, wherein the compensating of the effect caused by theelevation deviation comprises compensating an inter-aural leveldifference (ILD) resulting from the elevation deviation.
 27. The methodof claim 23, wherein the modified panning gain is proportional to theobtained elevation deviation.
 28. The method of claim 20, wherein a sumof square values of modified panning gains with respect to plurality ofoutput channels included in the output channel configurations for eachof plurality of input channels included in the input channelconfigurations is
 1. 29. An apparatus for rendering an audio signal, theapparatus comprising: a receiver configured to receive a multi-channelsignals including one or more height input channels, to be convertedfrom input channel configurations to output channel configurations; adeviation obtaining unit configured to obtain deviation informationabout an output channel, from an output loudspeaker position and astandard loudspeaker position; and a panning gain obtaining unitconfigured to obtain a panning gain for a height input channel to beconverted into the output channel based on the standard loudspeakerposition and to modify the obtained panning gain based on the deviationinformation and the standard loudspeaker position.
 30. The apparatus ofclaim 29, wherein the deviation information includes at least one of anelevation deviation and an azimuth deviation, and wherein the panninggain is modified to keep a central image corresponding to an azimuth ofthe height input channel.
 31. The apparatus of claim 29, wherein theplurality of output channels are horizontal channels.
 32. The apparatusof claim 29, wherein the output channel comprises at least one of a lefthorizontal channel and a right horizontal channel.
 33. The apparatus ofclaim 30, wherein the panning gain obtaining unit compensates an effectcaused by an elevation deviation, when the obtained deviationinformation comprises the elevation deviation.
 34. The apparatus ofclaim 30, wherein the panning gain obtaining unit compensates thepanning gain by a two-dimensional (2D) panning method, when the obtaineddeviation information does not comprise the elevation deviation.
 35. Theapparatus of claim 33, wherein the panning gain obtaining unitcompensates an inter-aural level difference caused by the elevationdeviation to compensate an effect caused by the elevation deviation. 36.The apparatus of claim 30, wherein the panning gain is proportional tothe obtained elevation deviation.
 37. The apparatus of claim 29, whereina sum of square values of modified panning gains with respect toplurality of output channels included in the output channelconfigurations for each of plurality of input channels included in theinput channel configurations is
 1. 38. A computer-readable recordingmedium having recorded thereon a computer program for executing themethod of claim 20.